🔄 Reinforcement Learning - wavage · Scour

Reinforcement Learning from Human Feedback

arxiv.org·18h

On Computation and Reinforcement Learning

arxiv.org·2d

Hybrid neural–cognitive models reveal how memory shapes human reward learning

nature.com·23h

🤝International Relations

Why reinforcement learning breaks at scale, and how a new method fixes it

techxplore.com·3d

Learning Models with Uniform Performance via Distributionally RobustOptimization

dev.to·19h·

Discuss: DEV

i10e-lab/HelloRL: A fully modular framework to make Reinforcement Learning quick and easy

github.com·1d·

Discuss: Hacker News

Dynamic Constraint‑Aware Multi‑Agent Reinforcement Learning for Real‑Time Urban Traffic Signal Control **Abstract** Urban traffic management demands responsi...

freederia.com·2d

Your Agent Is Slow Because of Inference

futureagi.com·1d·

Discuss: DEV

Barn Owls Know When to Wait (iuSTDP part 2)

blog.typeobject.com·11h·

Discuss: Hacker News

Quantization-Aware Distillation

ternarysearch.blogspot.com·6h·

Discuss: Hacker News

Meta-Optimized Continual Adaptation for deep-sea exploration habitat design with embodied agent feedback loops

dev.to·11h·

Discuss: DEV

Rethinking imitation learning with Predictive Inverse Dynamics Models

microsoft.com·2d

🤝International Relations

*Robust Hierarchical Reinforcement Learning for Bipedal Robots Performing Dynamic Balance on Sloped Terrains under Partial Sensor Failure*

freederia.com·2d

On Economics of A(S)I Agents

lesswrong.com·13h

🤝International Relations

Distributed Reinforcement Learning for Scalable High-Performance Policy Optimization

towardsdatascience.com·6d

Oatmeal - Constraint propagation for fun

eli.li·5h

Scientists reveal the alien logic of AI: hyper-rational but stumped by simple concepts

psypost.org·10h

🤝International Relations

Exploiting large language model with reinforcement learning for generative job recommendations

eurekalert.org·2d

Human-like Search for Modern Applications

anvitra.ai·5h·

Discuss: Hacker News

AI Workflows with human-in-the-loop

weavemind.ai·20m·

Discuss: Hacker News

Loading more...